Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic emulation of getcwd/chdir #214

Merged
merged 11 commits into from
Nov 30, 2020
Merged

Conversation

alexcrichton
Copy link
Collaborator

This commit adds basic emulation of a current working directory to
wasi-libc. The getcwd and chdir symbols are now implemented and
available for use. The getcwd implementation is pretty simple in that
it just copies out of a new global, __wasilibc_cwd, which defaults to
"/". The chdir implementation is much more involved and has more
ramification, however.

A new function, make_absolute, was added to the preopens object. Paths
stored in the preopen table are now always stored as absolute paths
instead of relative paths, and initial relative paths are interpreted as
being relative to /. Looking up a path to preopen now always turns it
into an absolute path, relative to the current working directory, and an
appropriate path is then returned.

The signature of __wasilibc_find_relpath has changed as well. It now
returns two path components, one for the absolute part and one for the
relative part. Additionally the relative part is always dynamically
allocated since it may no longer be a substring of the original input
path.

This has been tested lightly against the Rust standard library so far,
but I'm not a regular C developer so there's likely a few things to
improve!

Copy link
Member

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks great. Should make a lot more programs runnable.

libc-bottom-half/headers/public/wasi/libc-find-relpath.h Outdated Show resolved Hide resolved
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Here's an initial round of coments. I haven't yet figured out what to think about calling malloc+free on every open/stat/etc.; see below for details.

libc-bottom-half/headers/public/wasi/libc-find-relpath.h Outdated Show resolved Hide resolved
libc-top-half/musl/src/unistd/getcwd.c Outdated Show resolved Hide resolved
libc-top-half/musl/src/unistd/getcwd.c Outdated Show resolved Hide resolved
libc-top-half/musl/src/unistd/chdir.c Outdated Show resolved Hide resolved
libc-top-half/musl/src/unistd/chdir.c Outdated Show resolved Hide resolved
libc-bottom-half/sources/posix.c Outdated Show resolved Hide resolved
This commit adds basic emulation of a current working directory to
wasi-libc. The `getcwd` and `chdir` symbols are now implemented and
available for use. The `getcwd` implementation is pretty simple in that
it just copies out of a new global, `__wasilibc_cwd`, which defaults to
`"/"`. The `chdir` implementation is much more involved and has more
ramification, however.

A new function, `make_absolute`, was added to the preopens object. Paths
stored in the preopen table are now always stored as absolute paths
instead of relative paths, and initial relative paths are interpreted as
being relative to `/`. Looking up a path to preopen now always turns it
into an absolute path, relative to the current working directory, and an
appropriate path is then returned.

The signature of `__wasilibc_find_relpath` has changed as well. It now
returns two path components, one for the absolute part and one for the
relative part. Additionally the relative part is always dynamically
allocated since it may no longer be a substring of the original input
path.

This has been tested lightly against the Rust standard library so far,
but I'm not a regular C developer so there's likely a few things to
improve!
@alexcrichton
Copy link
Collaborator Author

Ok I've pushed up a second commit which should amortize the calls to malloc and reuse internal buffers where possible. I didn't register atexit handlers yet, though, but that may not be too too important in the context of wasm.

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing the malloc buffer looks good!

This patch does increase code size of simple programs using open by about 4K. While that's within noise for some users, it's noticeable for others.

This patch increases the size of the [WASI tutorial C example](https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-tutorial.md#from-c) compiled with `-lc-printscan-no-floating-point` from 24325 to 28359 bytes. Of that, 8690 bytes come from `malloc` etc. and 5148 comes from `printf` etc.

I'm considering whether we should move cwd emulation into a separate library. Similar to how emulated mmap requires compiling with -D_WASI_EMULATED_MMAN and linking with -lwasi-emulated-mman, we could make getcwd and chdir require compiling with -D_WASI_EMULATED_CWD and linking with -lwasi-emulated-cwd. Implementing this is mostly Makefile mechanics, which I can do separately, so we don't need to block this PR on it, but I am interested in whether anyone has opinions about this approach.

libc-bottom-half/sources/chdir.c Show resolved Hide resolved
libc-bottom-half/sources/posix.c Outdated Show resolved Hide resolved
libc-bottom-half/sources/preopens.c Outdated Show resolved Hide resolved
libc-bottom-half/sources/chdir.c Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Collaborator Author

This is something I'd like to enable by default with rust-lang/rust, so if binary size is a concern I think it would be better to handle this for every program rather than just those that don't opt-in. I'd also imagine that almost all programs would want to opt-in given how prevalent the concept of a current directory is. I feel like though this sort of a consequence of the wasi-libc strategy, which is that if we want to put stuff in wasm rather than WASI APIs that's inevitably going to make modules larger.

Is there a way to do weak linking tricks or something like that to only pull in this extra processing when chdir is used? If chdir is never used then there's really no need for most of this processing since the current directory is always /.

@sbc100
Copy link
Member

sbc100 commented Aug 5, 2020

This is something I'd like to enable by default with rust-lang/rust, so if binary size is a concern I think it would be better to handle this for every program rather than just those that don't opt-in. I'd also imagine that almost all programs would want to opt-in given how prevalent the concept of a current directory is. I feel like though this sort of a consequence of the wasi-libc strategy, which is that if we want to put stuff in wasm rather than WASI APIs that's inevitably going to make modules larger.

Is there a way to do weak linking tricks or something like that to only pull in this extra processing when chdir is used? If chdir is never used then there's really no need for most of this processing since the current directory is always /.

I think that might be possible yes. I think you would want to co-locate chdir and find_relpath in the same object file, then have callers of find_relpath reference it as weak symbol (maybe you could just move find_relpath2 so only once place would need to check for the weak sym).

@sbc100
Copy link
Member

sbc100 commented Aug 5, 2020

This is something I'd like to enable by default with rust-lang/rust, so if binary size is a concern I think it would be better to handle this for every program rather than just those that don't opt-in. I'd also imagine that almost all programs would want to opt-in given how prevalent the concept of a current directory is. I feel like though this sort of a consequence of the wasi-libc strategy, which is that if we want to put stuff in wasm rather than WASI APIs that's inevitably going to make modules larger.

Perhaps if its very common then it could be opt-out instead, so tiny programs could link with -D_WASI_EMULATED_CWD=0? Although maybe your weak linking idea is preferable?

@alexcrichton
Copy link
Collaborator Author

Ok so I'm having a really hard time wrangling weak symbols to do what I want. Inevitably they never seem to work for me...

First I refactored a bit where I defined a bunch of weak functions in places and then defined a copy of each function in chdir.c. @sunfishcode's sample program still, however, pulled in the symbols from chdir.c. I think the problem was that a reference to the symbols were made (e.g. posix.c depended on something weakly defined in preopens.c), and the linker found a strong version of the symbol in chdir.c and used that instead.

Next I refactored everything so the meat was defined in chdir.c and it was only referenced weakly from posix.c and preopens.c. The linker, however, still pulled in chdir.c since it seemed to see a reference to the weak symbol and attempted to resolve it.

Basically nothing I did was able to actually work. No matter what I did the linker kept pulling in the strong versions defined in chdir.c, even though chdir was not a referenced symbol. I feel like what I want is to place the chdir.o object into an archive, but I want the archive's symbol table to only list chdir for that object, nothing else defined in there. That way the linker, if it searched for symbols, wouldn't find chdir.c-defined-symbols unless chdir was referenced. I don't think this is an easy feature to implement, however.

I looked a bit though and of the 4k increase it looks like 3k is due to realloc. I changed all the realloc calls to free + malloc and the size increase is only 1k at that point.

I'm not really sure how much this is all worth it in these sense of any nontrivial application is almost surely going to have realloc in it already. Do y'all have other ideas for weak symbols though?

@sbc100
Copy link
Member

sbc100 commented Aug 6, 2020

The linker, however, still pulled in chdir.c since it seemed to see a reference to the weak symbol and attempted to resolve it.

This part is confusing me. A weak reference to symbol in chdir.o should not cause the linker to pulling chdir.o if chdir.o is part of an archive file.

Are you declaring the function like this within posix.c/preopens.c?

__attribute__((weak)) void foo(void); 

?

To aid debugging you can try -Wl,--trace-symbol=foo or -Wl,--verbose or if you really want to see all the gory details you can do -Wl,-mllvm,-debug-only=lld

@sbc100
Copy link
Member

sbc100 commented Aug 6, 2020

Oh.. I see the problem. The confusion is between weak references and weakly defined symbol. They are confusingly quite different and have completely different uses.

In this case you want weak references to strongly (normal) defined functions.

I think of it like this:

  • weak reference: I don't pull objects out of archives. (And could be NULL at runtime so check me before calling!)
  • weak symbol: I'm ok with another definition of this symbol overriding me.

We really should not have re-used the word weak for these two different concepts.

@alexcrichton
Copy link
Collaborator Author

In preopens.c I've got:

int __wasilibc_find_relpath_alloc(
const char *path,
const char **abs,
char **relative,
size_t *relative_len,
int can_realloc
) __attribute__((weak));

and it's called like:

    if (__wasilibc_find_relpath_alloc)
        return __wasilibc_find_relpath_alloc(path, abs_prefix, relative_path, &relative_path_len, 0); 

and in chdir.c I have the symbol actually defined, but when linking I get:

$ /opt/wasi-sdk/bin/clang foo.c --sysroot ./sysroot -o foo.wasm -Wl,--trace-symbol=__wasilibc_find_relpath_alloc
./sysroot/lib/wasm32-wasi/libc.a: lazy definition of __wasilibc_find_relpath_alloc
./sysroot/lib/wasm32-wasi/libc.a(preopens.o): reference to __wasilibc_find_relpath_alloc
./sysroot/lib/wasm32-wasi/libc.a(chdir.o): definition of __wasilibc_find_relpath_alloc

@alexcrichton
Copy link
Collaborator Author

(I can gist the full patch too if it helps, it's just not 100% finished yet)

@sbc100
Copy link
Member

sbc100 commented Aug 6, 2020

Can you try --trace-symbol chdir to see if anyone else is pulling that object in via that symbol?

@alexcrichton
Copy link
Collaborator Author

That just prints out

./sysroot/lib/wasm32-wasi/libc.a: lazy definition of chdir
./sysroot/lib/wasm32-wasi/libc.a(chdir.o): definition of chdir

@alexcrichton
Copy link
Collaborator Author

Ok so poking around a bit more, if I move chdir.o to the end of libc.a, then __wasilibc_find_relpath_alloc is not pulled in unless chdir is used. If chdir.o comes before preopens.o (which it does on my system) then __wasilibc_find_relpath_alloc is pulled in no matter what.

@sbc100 is this perhaps a bug in the wasm-ld?

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2020

Certainly sounds very odd. Are there any symbols that those object file both define? Otherwise I can't think of reason why ordering within an archive could/should matter.

I would love to take a look at a repro for this.. do you repro steps you could include here?

@alexcrichton
Copy link
Collaborator Author

Ok I've pushed up a commit which I think does the weak symbol business. The locally of the example program increases from 38686 to 39021, an increase of 335 bytes.

@sbc100 there's some makefile tweaks here, and if those makefile tweaks are removed then this example program increases 4k in size (pulls in __wasilibc_find_relpath_alloc)

@sbc100
Copy link
Member

sbc100 commented Aug 7, 2020

wasm-ld fix: https://reviews.llvm.org/D85567

@alexcrichton
Copy link
Collaborator Author

Nice! I've updated comments in the Makefile about the build logic around chdir.c to reference that change.

sbc100 added a commit to llvm/llvm-project that referenced this pull request Aug 10, 2020
…e) version is see first

When a weak reference of a lazy symbol occurs we were not correctly
updating the lazy symbol.  We need to tag the existing lazy symbol
as weak and, in the case of a function symbol, give it a signature.

Without the signature we can't then create the dummy function which
is needed when an weakly undefined function is called.

We had tests for weakly referenced lazy symbols but we were only
tests in the case where the reference was seen before the lazy
symbol.

See: WebAssembly/wasi-libc#214

Differential Revision: https://reviews.llvm.org/D85567
Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for putting together all the pieces for this! Just a few more comments:

libc-bottom-half/sources/preopens.c Outdated Show resolved Hide resolved
libc-bottom-half/sources/chdir.c Show resolved Hide resolved
libc-bottom-half/sources/chdir.c Outdated Show resolved Hide resolved
@alexcrichton
Copy link
Collaborator Author

Sorry for the delay, but should be updated now!

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@sunfishcode sunfishcode merged commit 5b148b6 into WebAssembly:master Nov 30, 2020
@sunfishcode sunfishcode mentioned this pull request Nov 30, 2020
arichardson pushed a commit to arichardson/llvm-project that referenced this pull request Mar 22, 2021
…e) version is see first

When a weak reference of a lazy symbol occurs we were not correctly
updating the lazy symbol.  We need to tag the existing lazy symbol
as weak and, in the case of a function symbol, give it a signature.

Without the signature we can't then create the dummy function which
is needed when an weakly undefined function is called.

We had tests for weakly referenced lazy symbols but we were only
tests in the case where the reference was seen before the lazy
symbol.

See: WebAssembly/wasi-libc#214

Differential Revision: https://reviews.llvm.org/D85567
mem-frob pushed a commit to draperlaboratory/hope-llvm-project that referenced this pull request Oct 7, 2022
…e) version is see first

When a weak reference of a lazy symbol occurs we were not correctly
updating the lazy symbol.  We need to tag the existing lazy symbol
as weak and, in the case of a function symbol, give it a signature.

Without the signature we can't then create the dummy function which
is needed when an weakly undefined function is called.

We had tests for weakly referenced lazy symbols but we were only
tests in the case where the reference was seen before the lazy
symbol.

See: WebAssembly/wasi-libc#214

Differential Revision: https://reviews.llvm.org/D85567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants